Skip to content

Update pinned numpy in github action#3974

Closed
tarang-jain wants to merge 6 commits intofacebookresearch:mainfrom
tarang-jain:pinned-numpy
Closed

Update pinned numpy in github action#3974
tarang-jain wants to merge 6 commits intofacebookresearch:mainfrom
tarang-jain:pinned-numpy

Conversation

@tarang-jain
Copy link
Contributor

Pin numpy version to < 2 in github action

@tarang-jain
Copy link
Contributor Author

tarang-jain commented Oct 19, 2024

@asadoughi I am surprised by how the seg fault in RAFT builds has suddenly arrived. My guess was a numpy version mismatch. The conda envs for RAFT 24.06 have numpy<2, which is why I pinned numpy=1.26.4. Upon running a valgrind on the torch tests, I see this:

...==1912667== Conditional jump or move depends on uninitialised value(s)
==1912667==    at 0x1270B55B: at::native::_to_copy(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==    by 0x1341BD55: c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd___to_copy>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==    by 0x12B12C28: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==    by 0x13264C8B: c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::_to_copy>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==    by 0x12C23375: at::_ops::_to_copy::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==    by 0x1270969E: at::native::to(at::Tensor const&, c10::ScalarType, bool, bool, std::optional<c10::MemoryFormat>) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==    by 0x135F2CF3: c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ScalarType, bool, bool, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd_dtype_to>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ScalarType, bool, bool, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, c10::ScalarType, bool, bool, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ScalarType, bool, bool, std::optional<c10::MemoryFormat>) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==    by 0x12DA753C: at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional<c10::MemoryFormat>) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==    by 0x12087C84: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==    by 0x120899F5: at::TensorIteratorBase::build(at::TensorIteratorConfig&) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==    by 0x1208AC24: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==    by 0x1239341E: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)
==1912667==  Uninitialised value was created by a stack allocation
==1912667==    at 0x12087320: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) (in /home/miniconda3/envs/faiss-main/lib/libtorch_cpu.so)

which makes me wonder that downgrading torch might help. Please let me know if you have any suggestions. This exact same action was working earlier, right? If the github action was unchanged, it makes me wonder that this has to do something with version compatibility of some of the packages since the action does not pin versions for any of the packages.

@asadoughi
Copy link
Contributor

We can look into version pinning for all packages involved for the RAFT CI. Do you have a compatibility version of torch for RAFT 24.06? More generally, is there a published compatibility matrix for each version of RAFT?

facebook-github-bot pushed a commit that referenced this pull request Oct 22, 2024
Summary:
Related to testing in #3974

Based on comparing the logs of two runs:
- failing: https://github.com/facebookresearch/faiss/actions/runs/11409771344/job/31751246207
- passing: https://github.com/facebookresearch/faiss/actions/runs/11368781432/job/31625550227

Pull Request resolved: #3980

Reviewed By: junjieqi

Differential Revision: D64778154

Pulled By: asadoughi

fbshipit-source-id: f4e53fed3850f3e0f391015c0349ee14da68330a
@tarang-jain tarang-jain closed this Jan 3, 2025
@saadala
Copy link

saadala commented Jan 3, 2025 via email

samanthawaters8882michaeldonovan added a commit to samanthawaters8882michaeldonovan/faiss that referenced this pull request Oct 12, 2025
Summary:
Related to testing in facebookresearch/faiss#3974

Based on comparing the logs of two runs:
- failing: https://github.com/facebookresearch/faiss/actions/runs/11409771344/job/31751246207
- passing: https://github.com/facebookresearch/faiss/actions/runs/11368781432/job/31625550227

Pull Request resolved: facebookresearch/faiss#3980

Reviewed By: junjieqi

Differential Revision: D64778154

Pulled By: asadoughi

fbshipit-source-id: f4e53fed3850f3e0f391015c0349ee14da68330a
dimitraseferiadi pushed a commit to dimitraseferiadi/SuCo that referenced this pull request Mar 8, 2026
Summary:
Related to testing in facebookresearch#3974

Based on comparing the logs of two runs:
- failing: https://github.com/facebookresearch/faiss/actions/runs/11409771344/job/31751246207
- passing: https://github.com/facebookresearch/faiss/actions/runs/11368781432/job/31625550227

Pull Request resolved: facebookresearch#3980

Reviewed By: junjieqi

Differential Revision: D64778154

Pulled By: asadoughi

fbshipit-source-id: f4e53fed3850f3e0f391015c0349ee14da68330a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants